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ABSTRACT 



We are performing a transient, microsecond timescale radio sky survey, called 
"Astropulse," using the Arecibo telescope. Astropulse searches for brief (0.4 /xs 
to 204.8 /is), wideband (relative to its 2.5 MHz bandwidth) radio pulses centered 
at 1,420 MHz. Astropulse is a commensal (piggyback) survey, and scans the sky 
between declinations of —1.33 and 38.03 degrees. We obtained 1,540 hours of 
data in each of 7 beams of the ALFA receiver, with 2 polarizations per beam. 
The data are 1-bit complex sampled at the Nyquist limit of 0.4 /is per sample. 
Examination of timescales on the order of a few microseconds is possible because 
we used coherent dedispersion, a technique that has frequently been used for tar- 
geted observations, but has never before been associated with a radio sky survey. 
The more usual technique, incoherent dedispersion, cannot resolve signals below 
a minimum timescale which depends on the dispersion measure and frequency of 
the signal. However, coherent dedispersion requires more intensive computation 
than incoherent dedispersion. The required processing power was provided by 
BOINC, the Berkeley Open Infrastructure for Network Computing. BOINC is a 
distributed computing system, which allows us to utilize hundreds of thousands 
of volunteers' computers to perform the necessary calculations for coherent dedis- 
persion. Astrophysical events that might produce brief radio pulses include giant 
pulses from pulsars, RRATs, exploding primordial black holes, or new sources yet 
to be imagined. Radio frequency interference (RFI) and noise contaminate the 
data; these are mitigated by a number of techniques including multi-polarization 
correlation, DM repetition detection, and frequency profiling. 



Subject headings: radio continuum: general — extraterrestrial intelligence — pulsars: 
general — black hole physics — cosmology: early universe 
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Facility: Arecibo(ALFA) 



1. Introduction 



1.1. Scientific motivation 



This is an exciting time in the field of transient astronomy, both in the radio and 
in other parts of the spectrum. Improving technology allows astronomers to perform 
fast followups of transient events, store extensive digital records of observations, and 
run processor-intensive algorithms on data in real time. These a dvances make possible 
instruments that examine optical afterglo ws of gamma-ray bursts (jVestrand et al.ll2005l ) or 
neutrino sources (jKowalski fc Mohrl 20071 ) . Hig h resolution digital images can be recor ded 
and stored quickly using current ( jKaiserl l2004j ) and planned technology ( Ivezic et al. 



2008 ). In the radio, astronomers search for transients s uch as orphan GR B afterglows 



( Levinson et al.l [20021 ) or radio bursts of unknown origin (iKatz et all 120031 ) . 



Our project, called "Astropulse," searches for brief, wideband radio pulses on 
timescales of microseconds to milliseconds, and surveys the entire sky visible from Arecibo 
Observatory. The idea of a short-timescale radio observation is not new. Other experiments 
are well-suited for detecting radio pulses on a microsecond timescale, or even much shorter 
scales. However, these observations are directed; they examine known phenomena. For 
instance, such an experiment might record the nanosecond structure of the signals from 
the Crab pulsar. And of course the idea of a radio survey is not new. Other experiments 
perform surveys for radio pulses over large regions of the sky. However, these observations 
examine 50 /is timescales or longer. Astropulse is the first radio survey for transient 
phenomena with microsecond resolution. 



This project is made possible by Astropulse's access to unprecedented processing power, 
using the distributed computing technique. Because the interstellar medium disperses 
radio signals, all of our data m ust be dedispersed. We send our data to volu nteers, who 
perform coherent dedispersion ( lLorimer fc Kramer! 120051 : lHankins et al.l[l987l ) using their 
own computers. Then they send the results of this computation back to us, informing us 
whether they detected a signal, and reporting that signal's dispersion measure, power, and 
other parameters. Astropulse is processor intensive because we must perform coherent 
dedispersion, whereas other surveys perform incoherent dedispersion. Coherent dedispersion 
is necessary to resolve structures below 50 fis or so, depending on the dispersion measure. 



We are not committed to detecting any particular astrophysical source; rather, we are 
motivated by our ability to examine an unexplored region of parameter space. However, we 
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consider that we might detect evaporating pri mordial black holes, mil lisecond (or faster) 
pulsars, or RRATs (rotating radio transients, iMcLaughlin et aDl2006l ). We will consider 
each of these possibilities in turn. We could potentially detect pulsed communications from 
extraterrestrial civilizations, though we do not discuss this possibility herein. 



1.2. Black holes 



1.2.1. Hawking radiation 



It was proposed by lHawkingi ( 19741 ) that a black hole of mass M emits radiation like a 
black body whose temperature is given by the following relation: 
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The radiant energy comes directly from the black hole's mass, and as a result, it 
is losing mass at a rate M oc — M~ 2 . Because the black hole radiates more power as it 
shrinks, we expect a burst of energy in the last moments of the black hole's life. One can 
make differe nt assumptions abo ut the energy distribution of the radiation from a black hole 



evaporation (jCarter et al 

r > " 



1976). 



For a "hard" equation of state, with an adiabatic index 
|, the radiation does not reach thermal equlibrium. The standard model falls into this 



category, and it would assume that the radiation behaves as a relativistic ideal gas, r = |. 
In this case, the final explosion of the blac k hole lasts on th e order of seconds. However, for 
a "soft" equation of state, as proposed by iHagedornl ( 19651 ). V could be much smaller. In 
this case, the explosion might happen in 10~ 7 seconds or less. Astropulse is ideally suited 
for detecting such fast explosions. 



We can integrate the radiant energy to find the total lifetime of the black hole, 
demonstrating that if the black hole is exploding now, it must have been created at a mass 
of 10 12 kg or less. Such a small bl ack hole cannot have been born from a star, and would 
have to be created in the big bang ( Hawking]|l971 ). from "density perturbations in the early 
universe" dMacGibbon et al.lfl990h . 
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1.2.2. Electromagnetic pulses 



The total amount of energy released in the last second of the black hole's life is about 
10 23 J . Most pre vious studies have atte mpted to de t ect th is energy in the cosmic gamma 
ray background ( Raine fc Thomas] 120051 ). But iReed ( 19771 ) suggested that some of this 
energy could be converted into a radio pulse. The idea is that as the black hole shrinks and 
becomes hotter, it starts radiating more and more massive particles, including electrons 
and positrons (due to pair production at the event horizon), but later, heavier particles as 
well. This forms a plasma fireball expanding around the black hole. As this conducting 
shell expands into the ambient magnetic field, it pushes the field out of the way, creating 
an electromagnetic pulse. Rees argued that for a magnetic field B around 5 x 10~ 6 Gauss 
and a critical mass of ~ 2 x 10 11 g, a radio pulse detectable in the 21 cm band is plausible. 



An observation of these pulses would be a very significant confirmation of both Hawking 
radiation and the existence of primordial black holes (PBHs). At the very least, we can 
put a limit on the possible maximum density of evaporating black holes in the universe, 
if we make some assumptions about their distribution, and contingent on the assumption 
that they produce radio pulses. This information would be relevant to cosmological models 
describing the big bang. 



Some groups have searched for thes e PBHs in the radio, but many researchers have 
looked for gamma-ray emission instead (jUkwatta et al.ll2010l ). Radio and gamma-ray 
surveys make very different assumptions about the PBHs' evaporation time, so their results 
are difficult or impossible to compare in a meaningful way. Ukwatta et al. describe PBH 
explosions as having timescales of seconds or minutes, whereas Astropulse is looking for 
microsecond pulses. 



1.3. Other sources 



Astropulse might also detect RRATs (IMcLaughlin et al.l 120061 ) or repeating or giant 
pulses from pulsars. Of these possibilities, giant pulses are the most likely. Astropulse is 
optimized for pulses of 200 /xs or less, but its sensitivity relative to other surveys is best at 
short timescales, arou nd 0.4 to 1.6 us. Th ese timescales are much too short for repeating 
pulses from a pulsar (ILattimer et al.lll990l ). even a millisecond pulsar with a small beam 
opening angle half-width, or from a RRAT. 



Giant pulses, on the other hand, can have very short timescales suitable for detection 
by Astropulse. The Crab's giant pulses h ave structure ranging from a few nanoseconds in 
duration at 2 Jy /is ( IHankins et al.ll2003l ) to 64 /xs or more at 1,000 to 10,000 Jy /xs, with 
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a typical pulse duration of a few microseconds ( Popov k, Stapperil2007l ). (See Section [37T1 
for a discussion of the Jy /xs unit.) 



2. Telescope and instrumentation 
2.1. Sky coverage 



Arecibo Observatory scans approximately one third of the sky, between declinations 
of —1.33 and 38.03 degrees. Because of this, Astropulse cannot see the galactic center 
(around —29° dec) but can see 452 out of the 1826 pulsars in the ATNF pulsar database 
El, including the Crab. Astropulse is a commensal survey; this means that other surveys 
control the telescope pointing, but allow Astropulse to collect data at all tim e s. Ou r 
partner surveys include GALF A (Galactic ALFA), discussed in iPeek fc Heilesl (2008); 



Stanimirovic fc Putnam! (I2006T ) and PALFA (Pulsar ALFA), discussed in |CordeJ (|2008). 



Our group also operates SETI@home, another commensal radio survey, and the two projects 
use the same data: a 1-bit complex sampled 2.5 MHz bandwidth centered at 1420 MHz. To 
date, we have observed for 1,540 hours with each of the 7 beams (and 2 linear polarizations 
per beam), for a total of 21,600 hours of observation time. We have been taking our primary 
set of data using the ALFA receiver from September 2006 until May 2010, for a total of 
3.7 years. This implies that we have had 1/21 of all possible observation time during those 
years. Since a good deal of Arecibo's time is dedicated to non-astronomical purposes, such 
as ionospheric science, our fraction of astronomy time is significantly larger than 1/21. 



2.2. ALFA receiver 



The ALFA (Arecibo L-band Feed Array) receiver has 7 dual-polarization beams on the 
sky arranged in a hexagonal pattern, each with a 3.5' beamwidth. The central beam has a 
gain of 11 K / Jy, and the other beams have 8.5 K / JyEL The system temperature is 30 K. 
The 6 peripheral beam pointings differ from the central beam by a maximum of 6.4'. 



^ttp : //www. atnf . csiro . au/research/pulsar/psrcat/, as of 6/29/2009 
2 http : //www.naic . edu/alf a/gen_inf o/inf o_obs . shtml 
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2.3. 



Downconverter 



Multiple experiments use the signal from the ALFA receiver, so we split the signal 
using an IF splitter. These 14 signals are attenuated by 6 to 13 decibels for purposes 
of level- matching, and then enter our multibeam quadrature baseband downconverter. 
Downconversion involves complex multiplication, resulting in 14 complex or 28 real 



These 28 real channels are digitized with 1 bit precision using comparators, and the 
resulting digital signals (the signs of the 28 voltages) are directed through ribbon cables to 
Digital Data Acquisition (DDA) cards on a PC, which is running our software that acquires 
and writes the data to disk. In addition, this software collects telescope coordinates from 
the Arecibo telescope's data broadcast network, SCRAMnet. The coordinates consist of the 
Right Ascension (RA) and Declination (Dec) to which the telescope is currently pointing, 
as well as the time for which that RA and Dec are valid. 

The data files are stored on a hot swappable SATA drive, which fills up in 14 to 20 
hours of observation time. Since we are taking data 1/21 of the time, we must swap out 
the SATA drive about once per two weeks. When enough drives have been collected, the 
Arecibo staff ship the drives to us at Space Sciences Lab, UC Berkeley. We use 20 SATA 
drives in all, each of which holds 500 or 750 GB. We also send a backup copy of each file 
to NERSC, the National Energy Research Scientific Computing Center. This ensures that 
we can retrieve the data at any time. In all, we have taken over 48 TB of data from ALFA 
multibeam. We need a large (6 TB) disk array at Berkeley to buffer the data before it is 
sent to volunteers. The volunteers' PCs then process the data and send the results back to 
Berkeley. For a discussion of data processing after this point, see Section H] on BOINC, and 
Section [3] on the dedispersion algorithm for the volunteers' client program. 



channels. 



2.4. 



Data recorder 



3. Pulse detection : thresholds and dedispersion 



3.1. 



Overview 



The primary function of the Astropulse program is to dedisperse potential pulsed 
signals ("candidate pulses"), then determine whether the dedispersed pulse surpasses an 
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appropriate power threshold. We will discuss the theory behind dedispersion and then the 
methods we use to select the thresholds. Then we will calculate the sensitivity of Astropulse 
in Jy /xs , a unit of "pulse area." 



The Jy /xs unit refers to the pulse's flux density (in Jy) integrated over its duration 
(in /xs .) It is called an "area" because it is calculated using this integral, which is the area 
under a curve. Although the unit of flux density (Jy) is a more conventional measure of 
sensitivity, the Jy /xs is more meaningful in our case, because we are attempting to detect 
unresolved pulses. For example, consider two pulses; one is 500 Jy and lasts 0.2 /xs , and 
the other is 1,000 Jy and lasts 0.1 /xs . When these pulses are dispersed, they will be similar 
in appearance; Astropulse cannot distinguish between them because their dedispersed 
durations are shorter than Astropulse's time resolution. But Astropulse can determine that 
both pulses are 100 Jy /xs. 



Note that when we describe a measured pulse's apparent area in Jy /xs , the actual 
pulse area may be different depending on any contributions to the system temperature. We 
assume a particular minimal system temperature (30 K) for the ALFA receiver, whereas we 
might have a different effective system temperature when looking at the Crab nebula. 



3.2. Dedispersion 



Between a radio pulse's source (i.e. black hole, pulsar, or ET) and our detector, 
the pulse is dispers e d as it travels through the Interstellar Medium (ISM). According to 
Lorimer fc Kramerl (120051 ). the relative time delay for frequency v is given by: 



t{v) = Vx DM/i/ 2 (2) 
DM = / n e dt (3) 



where d is the distance to the source of the pulse, n e is the electron density, and V is 
equal to 4.15 x 10 3 MHz 2 pc" 1 cm 3 s. 

A useful estimate for the dispersio n measure weighted mean electron density in our 
Galaxy is n e = 0.03 cm -3 (jGuelir 



.spersion 
nT ll973h . 



Astropulse loops through the data at several nested levels, and considers DMs ranging 
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from —830 pc cm -3 to —49.5 pc cm -3 and from 49.5 pc cm -3 to 830 pc cm -3 . We 
chose the lower limit, 49.5 pc cm -3 , for two reasons. First, we found that our sensitivity 
diminishes at low dispersion measures due to the effects of one-bit digitization. For instance, 
a (hypothetical) very strong, undispersed, 2 /is signal would be undetectable, since its 
signature in our data would only be five samples long, and each sample carries only one bit 
of information. Therefore, we can only detect dispersed pulses. Second, local interference 
at Arecibo Observatory is stronger at low dispersion measures. We tested the first effect by 
inserting simulated pulses into our detection algorithm, and the second effect by examining 
data from the telescope. In this way, we empirically determined a lower limit for our 
dispersion measure. The upper limit of 830 pc cm -3 was selected so that approximately 
half of the volum e of the Galacti c plane would be visible to our search according to the 
Galactic maps in ICordes fc Laziol ( 120031 ). 



Astropulse considers pulses of widths ranging from 0.4 /xs (a single sample) to 204.8 /is . 
The larger widths are tested by summing 2 e adjacent samples after dedispersion, where 
< £ < 9, and I takes integer values. 



3.2.1. Incoherent dedispersion and its limitations 



We have two choices for our methodology: coherent dedispersion and incoherent 
dedispersion. Astropulse uses coherent dedispersion, whereas other radio surveys use 
incoherent dedispersion. Incoherent dedispersion is much more computationally efficient, 
and for longer timescales it's almost as good as coherent dedispersion. However, as we 
will see, Astropulse would be unable to examine the 0.4 fis timescale without coherent 
dedispersion. 



Incoherent dedispersion means that the signal's power spectrum is calculated, and the 
power vs. time of each sub-band is analyzed. The method is called "incoherent" for this 
reason - the phase information about individual frequencies is lost; only the total power of 
each sub-band at each time is retained. Next, the sub-bands are realigned at all possible 
dispersion measures, in an effort to find one DM at which the components align to produce 
a large power in a short period of time. 

However, incoherent dedispersion is limited in two ways. First, the goal of recording 
power vs. time makes sense only on a timescale greater than 



(4) 
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where dv is the width of each sub-band. This is because of time-frequency uncertainty. 



Second, in each sub-band the pulse is dispersed by some amount dt 2 (v), which we can 
find using Equation |2j 



dt 2 {v) = t{v)-t{v + dv) 



V ■ DM V ■ DM 
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Under the assumption that incoherent dedispersion divides the band up into many 
small pieces, dv <C Av, or that the bandwidth is much smaller than the frequency, Av <C v, 
we obtain dv <C v. Then Equation [H] becomes: 



, . . V ■ DM dv . 

dt 2 (v) w 1-1 + 2— 9 

z/ z z/ 

= 2(D-DM)^ (10) 



Because the signal in each sub-band has a differential dispersion delay of dtiiy), the 
method cannot localize the pulse better than this. (If the survey examines a large bandwidth 
Av, dt 2 (v) may vary substantially for different sub-bands, but we are considering each 
sub-band individually.) For a signal of zero width, the differential dispersion delay across 
the band would be equal to the duration of the dispersed pulse. However, the component 
of the signal in each sub-band does not have zero width - it has width dt\. We will assume 
that the minimum timescale for incoherent dedispersion, dt, cannot be less than either dt\ 
or dt 2 (v), so that the smallest possible width is for dt = dt\ = dt 2 (v). (This assumption 
is appropriate if the signal has a slowly-varying power as a function of frequency, and if 
all frequencies were emitted at the same time prior to the dispersion process in the ISM. 
See Section T3.2.3I for a related argument.) Then we find that the minimum timescale for 
incoherent dedispersion happens when: 
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(13) 
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where the last step used Equa tion HI For the Crab pulsar, the dispersion measure is 
56.8 pc cm -3 (jSallmen et al.lll999[ ). and we are observing at a frequency of 1.42 GHz. Then 



we can substitute V = 4.15 x 10 3 MHz 2 ( pc cm 3 ) 1 s, DM = 56.8 pc cm" 
and I = 1420 MH. x tQ obtain . 
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(15) 
(16) 
(17) 



So for the Crab pulsar, this is a limit of 12.8 fis , or 32 samples at 1.42 GHz. For a 
more distant source, the limit might be as much as 50 fis, or 124 samples. (Astropulse 
considers sources with a DM as high as 830 pc cm -3 .) 



3.2.2. Coherent dedispersion as deconvolution 



Coherent dedispersion (jLorimer &: Krame r 2005; [Hankins et al.lll987 ) is an alternative 
technique that allows better time resolution by performing the mathematical inverse of the 
ISM's dispersion operation. Coherent dedispersion deals with amplitude rather than power, 
preserving phase information. In the absence of noise or scattering, and given precise 
knowledge of the pulse's dispersion measure, coherent dedispersion would reconstruct the 
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original pulse exactly. We need to analyze the mathematical operation corresponding to 
dispersion in order to find its inverse. 

If F{n) is the original pulse as a function of sample number n, suppose D[F] is 
the dispersed pulse. Then, relying on the time translation invariance and linearity 
of the dispersion operator, we can show that D is just a convolution. In particular, 
D[F] = F * D[5], where * is convolution and 5 is the discrete 5 function. Finally, the 
convolution theorem for Fourier transforms gives us: 



Equation [TBI gives us a fast method for dedispersing a pulse D[F], obtaining the 
original pulse F . This method is fast because the Fast Fourier Transform (FFT) algorithm 
is fast, taking time 0(N log N) to Fourier transform N samples of data. We can use this 
fact to estimate the run time of Astropulse's dedispersion algorithm. Astropulse has to 
dedisperse each set of N samples many times, since one dedispersion must be performed for 
every dispersion measure. If M is the number of dispersion measures to be tested, then N 
samples can be dedispersed in time 0(MN log N). Furthermore, Astropulse must operate 
on a long stream of data, of length L, which is much longer than the length N of a single 
Fourier transfom. Therefore, the total time required is O (ML log N). 

A similar calculation suggests that incoherent dedispersion would be faster. This is 
partly because incoherent dedispersion performs fewer tests. Incoherent dedispersion cannot 
test as many dispersion measures as coherent dedispersion does, because its time resolution 
is imperfect and it cannot always distinguish between different dispersion measures. To 
test a particular dispersion measure, an algorithm must target a particular time delay 
between the minimum and maximum frequency in the band. This time delay cannot 
be determined more accurately than the time resolution dt (Equation [TT1) resulting from 
incoherent dedispersion. If dt corresponds to n samples, then incoherent dedispersion can 
test only - as many dispersion measures as coherent dedispersion. In other words, with 
incoherent dedispersion, we would test not M, but M/n dispersion measures. 

With incoherent dedispersion, we must process L samples for each dispersion measure, 
so we require time 0(ML/n). (Some additional time is required to Fourier transform the 
data into a power spectrum, but this process is not dominant.) So the time ratio between 
coherent and incoherent dedispersion is 0(M Llog N / (M L / n)) = 0(n\ogN). In our case, 
the time resolution dt will vary by dispersion measure, but assuming a value of 20 fis , 
corresponding to a DM of 139 pc cm -3 , and a sample duration of 0.4 /xs , we obtain n = 50. 
Then logiV = 15, so nlogiV = 750, a large number. Therefore, we expect that incoherent 
dedispersion is substantially faster than coherent dedispersion, as long as both methods are 
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applicable. Coherent dedispersion is useful in situations where a very short time resolution 
is required, shorter than that permitted by incoherent dedispersion. 



3.2.3. Computing the nonlinear chirp function 

The "chirp" function, DFT(£) [<!>]), plays an important role in Equation [T8j The function 
D[5] represents a dispersed delta function, and the DFT transforms it into the frequency 
domain. In other words, we imagine a brief, strong pulse emitted by an astrophysical 
source. The pulse is dispersed by its passage through the interstellar medium, resulting 
in a chirp function that is spread out in time. (The name "chirp" is meant to suggest a 
sound with changing frequency, just as the chirp signal has a changing radio frequency.) 
Our goal in this section will be to compute the functional form of this dispersed signal in 
the frequency domain. 

For the sake of brevity, we will write / for D[8], and / for its Fourier transform, 
DFT(D[5]). Then f(t) can be written in terms of an amplitude and a phase: 



f(t) = A(t)expiG(t) (19) 

If we assume that A(t) is slowly-varying, then it is meaningful to talk about a 
"frequency at time t", u(t) = or v(t) = ^§/27r. In that case, we could compute 

8(t) = 2tt J u(t)dt. Analogously, we will assume that f(t) can be written in terms of an 
amplitude and a phase: 



f{u) = A{v)expi6{v) (20) 



Using similar logic, we could hypothesize that 9(v) = 2n J t{y)dv. This hypothesis 
turns out to be correct, except for a minus sign, which we will now demonstrate. 

We begin by finding the Taylor expansion of 6{y) around a particular frequency v%. 
This gives us the expression: 
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9{v) « 9fa) + ^ 



(21) 



Equation [21] corresponds to the component of the signal with frequencies near v%. We 
can determine the arrival time of these frequencies by finding the Fourier transform of the 
frequency distribution: 



g(t) 



A{v) exp(i(0(i/i) + ^ 
dv 



[v — v\))) exp(i2-Kvt)dv 



A{y) exp(zi 



dJ9 
dv 



2irt)u) exp(— i 



dd 
dv 



■ vi)dv 



(22) 
(23) 



When t = —-^%\v x i the integrand of Equation |2"31 becomes constant, so the integral is 
infinite. This, then, is t(v\), the arrival time of frequency v\. Therefore, 



dO 
dv 



= -2Txt{y x ) 

i 

2tt J t(v)dv 



(24) 
(25) 



This demonstrates that our hypothesis was correct, except for a minus sign. So to 
compute the phase of f(v), the chirp function in the frequency domain, we need only find 
t(v) and integrate. We can rewrite Equation [2] by combining the constant factors: 



t{v) = B/v 2 



(26) 



Since this equation gives the relative time delay rather than the absolute time delay, it 
will still be true if we add or subtract a constant of our choice. Using band center frequency 
v = 1420 MHz, we obtain: 
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t(v) = 5(1 - 1). (27) 



v 2 vl 



We can find the exponent of the frequency- domain amplitude by integrating this 
equation. After including the factor of — 2n from Equation [251 we obtain: 



-2tt / t{v) = 2nB(- + ^ + C) (21 

v z/ 

vl v 2 vlv 



2kB{-^ + -^ + C^) (29) 



2 1 2 1 2 

Z/Z/q Z/Z/q Z/Z/q 



2nB {u ~ U f . (30) 



z/z^ 



where we have judiciously chosen C = —2 to simplify the equation. This results in a 
frequency domain amplitude: 



/(„) = A{v) exp(2niB^—^). (31) 



Since v m z/ in our application, the chirp function is approximately exp(27rii? 



v 5 — 

!/ () 



so that the exponent can be approximated as a quadratic in v. But the extra factor v^jv 
is easily included in our dedispersion algorithm, so we have done so. This extra factor is 
close to 1, since in our case z/o = 1.42 GHz, and v differs from vq by at most 1.25 MHz. 
(We can ignore the factor A(z/), assuming it is approximately constant in v over our small 
bandwidth.) 



3.3. Thresholds 



Astropulse searches for pulses whose power exceeds certain thresholds. These 
thresholds can be calculated either experimentally or theoretically. We'll start by finding 
the theoretical values, then point out some of the uncontrollable factors that make these 
values inaccurate, and finally we'll describe a Monte Carlo simulation method for calculating 
thresholds. 
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3.3.1. Single pulse thresholds: theory 



We want to calculate the distribution (pdf) of the integrated noise power in 2 e samples, 
after dedispersion. (Here, the "power" refers to the absolute value of the square of the 
amplitude, where the undispersed time series has amplitudes ±1.) We will perform some 
calculations and conclude that the dedispersed signal fd(t) is distributed like a complex 
Gaussian at each time t. 

First, we assume that the pre-dedispersion time series is pure white noise; that is, each 
bit of a two bit complex sample is independently distributed with equal probability of a 
or 1, so each f(t) has equal probability for ±1 ± i. (Note that this / is discrete- valued 
with a discrete time argument, as opposed to the continuous- valued / with continuous 
time argument described in Section [3.2.31 ) Then we deconvolve this data by FFT. In other 
words, 




N-1 



(32) 



The distribution of a single f(k) is Gaussian (by the central limit theorem), and to 
deduce its variance, we will find the variance of its real and complex components dt(f(k)) 
and 5s(f(k)) independently: 



»(/>)) = ^=^^(/(t))cos(27rH/iV)-S(/(t))sin(27rH/iV) (33) 



t=o 




t=0 




(34) 




iV-l 



(35) 



t=o 




(36) 



Equation 1351 follows from the previous equation because f(t) can only take the values 
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±1 ± i, therefore its real and imaginary components each have variance of 1. 



Therefore, the variance of the real component of f(k) is 1, and the same argument 

holds for the imaginary component. Once we have obtained f(k), the remaining steps in 
the dedispersion are to multiply by a frequency- domain chirp function, followed by Fourier 
transforming back to the time domain. 



The frequency- domain chirp function has the form e ld ^ for some real phase 9. Since 
f(k) is already a complex number with random phase, multiplication by another complex 
phase has no effect on the probability distribution of f(k). Finally, we run the inverse Fourier 
transform to obtain a dedispersed signal, fd(t). Since Var(3ft(/(/c))) = Var(3?(/(t))) = 1, the 
same mathematical argument that we applied to the forward Fourier transform also applies 
to the inverse Fourier transform. Then we find that the dedispersed signal f d (t) is again a 
complex Gaussian, such that and £s(fd(t)) are Gaussians with variance 1. 

The power in each sample after dedispersion will be distributed as |9?(/d(t))| 2 + 
|Q ; (/d(^))| 2 , the sum of the squares of two standard Gaussians. This distribution is easily 
calculated; the joint probability distribution is: 



1 

e 2 



1 =j£ 

e 2 



'Its V2tt 
1 

— e 2 rdrdO 

e~~rdr 
e- u du. 



dxdy 



(37) 

(38) 

(39) 
(40) 



where u = x ^ is half the power in one sample, in the time domain. Therefore, 
half the power is exponentially distributed with mean 1; or equivalently, the power is 
exponentially distributed with mean 2. 



In future calculations, we will normalize to half of this power, so that the average 
power per sample is 1. 



So for instance, if after dedispersion we find that a certain sample has a power P, we 
conclude that only one in e p samples has a comparable power. To ascertain how unlikely 
this is, we need to calculate how many such samples we have examined over the entire 
course of the experiment. This would be: 
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48 TB x 4 • 10 12 samples per TB x 14208 DMs 

x 2 DM signs 
= 5.45 x 10 18 = e 43 ' 1 . (41) 

So far, we have considered pulses that are one sample in width. However, we are 
also searching for pulses of width 2, 4, 8, ... , 512 samples. To search for wider pulses, we 
sum the power over 2 e adjacent samples, for each integer i between and 9. We will 
refer to this summation as "co-adding," and each of the 10 possible widths is a "co-add." 
We compare this summed power to an appropriate threshold, which is larger at higher 
co-adds. There are half as many potential pulses at each co-add, compared with the 
previous co-add. So Equation 04] underestimates the number of potential pulses by a factor 

Of 1 + | + | + ■ • ■ + 512 « 2.) 

Therefore, with a threshold of 43.8 for one-sample potential pulses, and appropriate 
thresholds for potential pulses of other sizes, we would rule out all but one noise event 
over the course of our entire observation history. Thus, if Astropulse had no mechanism to 
remove noise other than to raise the detection threshold, our threshold would have to be 
quite high. Fortunately, we do have an alternative mechanism; we discuss this issue further 
in Section 15.1.81 

Rather than raising the threshold to 43.8, it seems more prudent to aim for one noise 
event to exceed threshold in each workunit - a unit of data defined to be 13 s for logistical 
reasons related to our data processing methods - and sort out false pulses later. (More than 
one pulse per workunit would be difficult to store in our database.) 

In this case, we just want to find C, the number of samples we examine, multiplied 
by 2 to account for co-adds. This gives us the number of potential pulses (counting all 
co-adds) per workunit. 



C = 2 25 samples per workunit x 14208 DMs x 2 signs x 2 from co-adds = e 28 3 . (42) 



Since the standard deviation for the exponential is a = 1, this means that by setting 
the threshold at 28.3 we would be looking for pulses that are 28.3 — 1 = 27.3cr above the 
mean. 

When we co-add n = 2 e samples to make one co-added potential pulse, we are adding 
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up that many exponential distributions. The resultng power has a gamma distribution, 
with scale parameter 1 and shape parameter n. The pdf is 



' x n - l e~ x . (43) 



r(n) 

and the complementary cumulative distribution function is defined to be: 



' -x n ~ l e- x dx = Eil^l. (44) 



Yin) Y(n) 



where Y(n,x) is the upper incomplete gamma function. The first few pdfs are shown 
in Figure [TJ 

Then for each n = 2 e , we want to select a threshold, H n , such that ^ = Y(n, H n )/Y{n) 
We admit pulses of width n only if they have power greater than or equal to H n . Since a 
pulse of power H n would occur by chance with probability ^, and there are precisely C 
potential pulses in a workunit, we expect to obtain C • ^ — 1 false positive per workunit. 

Because the probability distribution comes from a sum of identical exponential 
distributions, we can say that for co-add n = 2 e , the variance is n times higher, and the 
standard deviation is y/n times higher, than for an exponential. If we define m to be the 
number of standard deviations of our threshold above the mean, then we are looking for 
pulses at m n = (H n — n)j ^fn. A computation of m n , using our actual thresholds H n (as 
determined by simulation, rather than theory) can be found in Table [TJ 



3.3.2. Expected discrepancies with the model 
A few differences from the model can be expected: 



1. Hydrogen line and filter shape. We assumed above that the input data is white 
noise. In practice, this is not the case, because a portion of our band has higher power 
due to the hyperfine hydrogen line. The strength of this line can vary depending on 
our RA and dec. Then f(k) no longer have equal standard deviations. This will 



-20 - 



cause some correlation between the deconvolved power of adjacent samples, which will 
modify the pdf of the binned power, increasing the variance. 

To see this, consider the simplest, most extreme case: we imagine that the hydrogen 
line takes the form of a strong delta function in the frequency domain of amplitude A 
at frequency k , where A is distributed randomly according to a Gaussian distribution 
with standard deviation a and mean 0. Then if fd is the dechirped amplitude in 
the time domain, fd{t) = Ae 2mkot ^ N . (In other words, the dispersion is not relevant, 
since the hydrogen line has a single frequency, and we are momentarily assuming 
it overwhelms the noise.) So \fd(t)\ 2 is exponential with power a 2 , as discussed in 
Section 13.3.11 Now consider two nearby times t± ~ t2 such that the phase of the 
hydrogen line does not change much between these samples. We want to sum the 
amplitudes fd{t) at these nearby times in order to build a co-add. Then \fd(ti) + fdfo)] 
is A\e 2mkoh ^ N + e 2mk ot2/N^ Si nce the phases are similar, this is roughly 2 A, which 
is Gaussian with standard deviation 2a and variance 4a 2 . Whereas if we summed 
samples without the hydrogen line, adding identical and independently distributed 
(iid) exponentials, the variances would simply add to give 2a 2 . So this model hydrogen 
line increases the variance. 

In actuality, the effect of the hydrogen line is not so pronounced, but the idea is 
similar. In the same way, the nonuniform shape of our low pass filters also causes the 
signal to differ from white noise. 

2. Other disparities 

Even in the absence of the hydrogen line, tests reveal other differences between the 
theoretical and actual distributions. For high co-adds, the variance is slightly less 
than expected. 

It's easy to see that power per sample cannot be independently distributed, even in 
the case of white noise. This is because the total power over all samples must be a 
constant; in our case, the constant is 32,768 = 2 15 , the total number of samples in a 
FFT. This would certainly result in a smaller variance, but we have not established 
whether this effect suffices to explain the observed disparity. 



3.3.3. Single pulse thresholds: Monte Carlo simulation 



To choose our thresholds for the single pulse search, we ran the client on 10 "noise" 
workunits, which we had constructed to contain only white noise. We kept track of the 
strongest pulses we found at each co-add. The second largest pulse out of 10 is roughly the 
90th percentile, so we set our thresholds at that point. This method gives thresholds that 
are reasonable as long as we don't demand that we detect precisely equal numbers of pulses 
at each co-add. The thresholds suggested by this simulation were within 1% to 5% of the 
theoretical values. 
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Because the "noise" workunits contained only white noise, with no Hydrogen line, 
some of the deviations described above would not be expected to occur. For this reason, 
our simulation was an imperfect model. However, our concern in this case was not to 
model the noise and Hydrogen line perfectly, but to obtain a rough estimate for the correct 
thresholds. The Hydrogen line looks slightly different at different points on the sky, so a 
perfect simulation would be impossible in any event. As discussed above, our thresholds 
are supplemented by our RFI and noise rejection algorithms, so we have some flexibility in 
setting thresholds. 

Table 1: Pulse area thresholds H n , in normalized units such that one sample has an expected 
power of 1 unit, derived from the Monte Carlo simulation. We denote the implied number 
of standard deviations above the mean by "m" 



£ 


n 


H n 


m 





1 


29.1 


28.1 


1 


2 


31.6 


20.9 


2 


4 


37.9 


16.6 


3 


8 


49.4 


14.6 


4 


16 


61.3 


11.3 


5 


32 


87.0 


9.7 


6 


64 


128.9 


8.1 


7 


128 


212.6 


7.5 


8 


256 


362.0 


6.6 



3.4. Expected sensitivity 

3.4-1- Scattering 



We will discuss the sensitivity of Astropulse and other surveys, but we will first 
consider scattering. Scattering impacts the sensitivity of a survey by limiting its resolution 
We suppose that an instantaneous pulse would be broadened by scattering to a wid t h AL C 
Then we can estimate At sc using the empirical formula given in lLorimer &: Kramerl ( 2005 ): 



log At SCjms = -6.46 + 0.154(logDM) + 1.07(logDM) 2 - 3.86 log u GHz . (45) 



However, this formula applies to sources in the Milky Way. Astropulse spends a 
substantial fraction of the time looking outside our Galaxy, in which case At sc should be 
much smaller, even for large DMs. But the distribution of the intergalactic medium (IGM) 
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is not well understood. lLovell et al.l ( 120071 ) find that extragalactic radio sources at redshifts 
greater than z = 2 do not have microarcsecond structure, suggesting that they are scatter 
broadened by turbulence in the IGM. This fact can be used to estimate the width At sc 
of the broadened pulse. A microarcsecond of angular broadening at z = 2 c orresponds to 
a pulse width of At sc = 9 2 d/c = 8 /is (using d = 3.57 Gpc). According to |loka] fl2003f ). 
this is at a DM of about 2000 pc cm" 3 . So perhaps we can assume that at the (smaller) 
DMs of our experiment, pulses will have reasonably small widths. For instance, inside the 
galaxy, a DM of 830 pc cm" 3 would have a scattering width 400 times smaller than a DM 
of 2000 pc cm" 3 . Even if the scattering width were nearly 8 fis , Astropulse is still good at 
detecting such pulses. (The threshold is just twice as high as for 1 sample pulses.) 



3.4-2. Sensitivity of Astropulse 



To calculate Astropulse's expec t ed se nsitiv ity as a pulse area in Jy us , w e can follow 



the treatments in iRohlfs fa Wilson! ( 120001 ) and IVan Vleck fa Middletonl (11966), which 
discuss the effect of "clipping" a noisy analog signal, changing it into a one-bit digital time 
series. 

From these sources we conclude that if F is the flux density and N is the duration in 
samples of the minimal detectable pulse, then its pulse area is: 



F • (iVt sample ) = l T ot-MH N -N) _ (46) 



In this expression: 



1. Hn is the power threshold derived from the gamma distribution in Section 13.3.14 and 
is dependent on N. 

2. T ~ 30 K is the system temperature]! 

3. G = 10 K Jy" 1 is the telescope gain, roughly equal to 4, where: 

4. A = j£ is the effective area of the telescope and k is Boltzmann's constant. 

5. A = 21 cm is the wavelength of the signal. 



3 http : //www.naic . edu/alf a/gen_inf o/inf o_obs . shtml 
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6. £1 = 8.1 - 10 7 is the beam's solid angle. 



So the resulting pulse area is 1.9 (Hn — N) Jy /is . 



3.4-3. Sensitivity comparison 



While Astropulse detects a signal coherently, other undirected radio surveys use 
incoherent detection schemes. Typically t hey use a filte r bank , dividing the spectrum into 
iV sub-bands as described in Section [321 iDeneva et al.l ( 20091 ) give the sensitivity formula 



as: 



W mS sys 

W = (W* + At* + At* err + A^) 1 / 2 . (48) 



S sys is the system-equivalent flux density. 

m is the desired number of standard deviations for the detection threshold, 

W is the effective width of the pulse, including broadening due to dispersion and 
scattering. 

Wi is the intrinsic width of the pulse, prior to broadening. 
AtoM.ch is the dispersion within one channel. 

A^DM.err is the error caused by looking at the wrong dispersion measure. We have a 
DM error of | the DM step. (The time error AtDM,err depends on the bandwidth as 
well as the DM error.) 

At sc oc f~ 3m is the error caused by scattering broadening, where / is the pulse's 
frequency. 



Equation W7\ assumes that the channel bandwidth is not so narrow that we are sampling 
beyond the Nyquist rate. If the channel bandwidth were that narrow, there would be 
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another contribution to the effective width; but all surveys are careful not to sample beyond 
the Nyquist rate. 

If one cares about the pulse area in Jy /is rather than the instantaneous flux density, 
a simpler expression will suffice. If one has observed at a single polarization, the minimum 
detectable pulse area for that polarization will be A = ^J^- t = m ^yt/B, and the pulse 

area of both polarizations together will be A = m^sjtj B. If both polarizations are 
observed together and their power is summed, then the minimum detectable pulse area of 
the two polarizations together is smaller by a factor a/2, assuming an unpolarized signal. 

As discussed in Section I3.4.1[ the scattering error is likely to become less important 
when we are looking outside our Galaxy. So, with an understanding that different conditions 
would apply when observing the Galaxy, we will ignore the scattering error in Table |2] and 
set t = tsampie- In Table El we list characteristics of Astropulse and other surveys and the 
following conventions are observed: 



• m: number of standard deviations for threshold. For any survey that uses incoherent 
dedispersion, this refers to the standard deviation of a Gaussian distribution, so 

m = 6 suffices to rule out all but 1 in 10 9 spurious detections due to noise. However, 
for Astropulse, the distribution is a chi-square, as discussed in Section 13.31 In 
the worst-case scenario - a one-sample pulse - this chi-square is equivalent to an 
exponential, and m = 23 is required to rule out all but 1 in 10 9 spurious detections 
due to noise. In fact, we selected the value m = 30 for Astropulse in the case of 
one-sample pulses. For all surveys, the optimal value of m is not determined only 
by the statistical distribution, but also by RFI and noise rejection. For Astropulse, 
the statistics are discussed at length in Section 13.3.11 and RFI mitigation is discussed 
in Section [5j In Table EJ we have listed the values of m reported by each survey in 
available publications. If we could not determine the survey's m-value, we assumed 
m = 6. We are assuming that astrophysical pulses which surpassed each survey's 
stated threshold would be detected by the survey and could be distinguished from 
noise and RFI. 

• 4am P i e : the time resolution of the survey. 

• t: the minimum effective duration of a pulse after dedispersion. 

• beam Q: the telescope beam width, in steradians. 

• beams: number of simultaneous beams. 

• t b s : observation time per beam, in hours. 

• iVp 0l : Astropulse detects a pulse using data from a single polarization; many surveys 
combine two polarizations. 



-25 - 



• sens: the minimum detectable pulse area. For each survey, the listed sensitivities 
apply for pulses which are unresolved by that survey. Pulses of duration 0.4 /is 
are unresolved by all surveys, including Astropulse. For pulses of greater duration, 
Astropulse's sensitivity will degrade. Table [TJ discussed earlier in Section 13.3.11 depicts 
the relative increase in Astropulse's minimum detectable pulse area for wider pulses. 
Other surveys' sensitivities do not degrade until their time resolution is reached. 



In order to compare the surveys using a concrete example, we will also provide 
information about each survey's ability to detect evaporating primordial black holes, under 
specific assumptions: that M = 10 s kg of the black hole's mass is transformed into a radio 
signal of bandwidth 1 GHz, and that scattering broadening does not substantially interfere 
with detection. These assumptions are not intended to be representative of all models of 
evaporating black holes, and black holes themselves are not the only potential source of 
microsecond pulses. 



• dma,x'- the minimum distance from which an exploding M = 10 8 kg black hole would 
be visible, in kpc. It's calculated using 

U min = energy/(area ■ bandwidth) = (Mc 2 )/(ATcd 2 mSiX ■ 1GHz), (49) 
where U min is the pulse area of the minimum detectable signal in Jy /xs . 

• rate: the minimum rate of black hole explosions under which such a black hole would 
be detectable, V~H~£ S . Here, V = (47r/3)^ ax n beams £ = §Od^nbeama is the volume 
of space observed at any one time. 



We conclude that Astropulse's minimum detectable rate (in black holes explosions pc 3 
yr " 1 ) is comparable t o tha t of o ther surveys, but no t superior. Astropulse's rate is similar 
to lLorimer & Bailesl fl2007f ) and [D eneva et al.l (l2009h . Our sensitivity to unresolved pulses, 
Jy us , is superio r to all other surveys listed except for the Arecibo multibeam survey of 



in 



Deneva et al.l ( 120091 ). This sensitivity comes largely from our microsecond time resolution 
and high gain. Our observation time is also superior. Astropulse does have substantial 
disadvantages, including a limited bandwidth and narrow (Q — 8.1 ■ 10 -7 ) beams. 



Table 2: Survey parameters. Parentheses around a value indicate that we assume this value because we could not 
deduce one from the original paper. 



# 


author 


telescope 


year 


dedisp 


ref 


uo (MHz 


) m T (K) 


^■sample (/^) 


t(fis) 




1 


0' Sullivan et al. Dwingeloo 


1978 


incoh 


a 


5000 


(6) 65 


2 


2700 




2 


Phinney & Taylor Arecibo 


1979 


incoh 


b 


430 


6 175 


1.7 - 10 4 


1.7- 10 4 




3 


Amy et al. 


MOST 


1989 


incoh 


c 


843 


(6) 


1 


1.7 • 10 4 




4 


Katz & Hewitt STARE 


2003 


incoh 


d 


611 


5 150 


125000 


125000 




5 


McLaughlin et al. Parkes 


2006 


incoh 


£ , f 


1400 


5 21 


250 


250 




6 


Lorimer & Bailes Parkes 


2007 


incoh 


g 


1400 


(6) 21 


1000 


1000 




7 


Deneva et al 


Arecibo 


2008 


incoh 


h 


1440 


5 30 


64 


64 




8 


Von Korff et al. Arecibo 


2009 


coher 


_ 


1420 


30 30 


0.4 


0.4 




# 


Av (MHz) 


G (K Jy" 1 ) beam O 


beams 


t Q bs (h) 


Npol 


sens ( Jy us) 


(i ma , (kpc) 


rate (pc 3 yr 


1 


100 


0.1 6.6- 


10" s 


1 


46 


1 


2 ■ 10 4 


61 


3.8 ■ 10- 


-7 


2 


16 


27 6.6 ■ 


10~ 6 


1 


292 


1 


1300 


240 


9.4-10" 


-10 


3 


3 


3.6 ■ 


io- 8 


32 


4000 


1 


1.6 • 10 5 1 


22 


5.6 • 10- 


-7 


4 


4 


6.1 -10- 5 1.4 


1 


13000 


2 


1.5 • 10 9 


0.22 


1.3 • 10 


-7 


5 


288 


0.7 1.3- 


10~ 5 


13 


1600 


2 


99 


870 


1.5 • 10 


-13 


6 


288 


0.7 1.3- 


10~ 5 


13 


480 


2 


240 


560 


1.8-10" 


-12 


7 


100 


10 8.1- 


io- 7 


7 


420 


2 


8.5 


3000 


4.2 • 10" 


13 


8 


2.5 


10 8.1- 


IO" 7 


7 


1540 


1 


55 


1200 


1.9-10" 


12 



O'Sullivan et all (119781 ) 



iPhinnev fc Tavlod (jl979h 
Amv et all (jl989h 



Katz et all (120031 ) 



McLaughlin et all (120061 ) 



Manchester et all (I200lh 



Lorimer fc Bailed (l2007t ) 



Deneva et all (l2009h 



*MOST has 1 mJy of noise in each beam after 12 hours, http:/ /www. physics. usyd.edu.au/sifa/Main/MOST 
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4. Distributed computing : the BOINC platform 



Astropulse runs on the BOINC platform flAndersonl l200i ) . an acronym for "Berkeley 
Open Infrastructure for Network Computing." BOINC is a set of programs that organizes 
volunteers' home computers to perform scientific calculations. In a typical BOINC project, 
a researcher has a computing problem that can run in parallel, that is, on several machines 
at once. Perhaps the problem involves searching a physical space (for Astropulse, this 
space is the sky), and performing the same computation on each point in that space (for 
Astropulse, this computation is dedispersion.) The first BOINC project, SETI@home, 
searched the sky for narrowband transmissions. The space could also be a parameter 
space, for instance a space of potential climate models (climateprediction.net) or protein 
configurations (Rosetta@home). The visible manifestation of a BOINC project is an 
informative screen saver. Figure [2] shows the Astropulse screen saver. 



Although the volunteers are providing their computers for free, the bandwidth and 
storage space required to distribute data to the volunteers is not free. A project is suitable 
for BOINC only if it is computation-intensive. That is, the monetary cost to perform the 
computation must be greater than the monetary cost of distributing the data. Coherent 
dedispersion satisfies this requirement, because we must perform FFTs at many DMs. 



The researcher for a BOINC project need not be affiliated with UC Berkeley, or with 
the BOINC development team at Berkeley (although we happen to be so affiliated). BOINC 
is open source, and can be downloaded, compiled, and operated by anyone with sufficient 
technical skills; about 50 projects currently exist outside Berkeley. 



Likewise, volunteers need not have any particular technical knowledge. They just have 
to navigate to the BOINC web page with their web browser, and follow the instructions 
to download the Astropulse "client" program. Astropulse has access to around 500,000 
volunteers, each of whose machines might have 2 GFLOPs of processing power, and be on 
1/3 of the time, for a total of 300 TFLOPs - as much as the world's fastest general purpose 
supercomputer in 2007, IBM's Blue Gene / L@. Since that time, the processing power of the 
fastest supercomputer has increased to 8000 TFLOPs or moraj. 



5 http : //www . top500 . org/list/2007/06/100 
6 http : //www . top500 . org/list/201 1/06/ 100 
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5. RFI mitigation 



The current section describes the methods by which we have classified all detected 
pulses, deciding whether they might correspond to RFI or noise. We will begin by discussing 
a number of RFI and noise mitigation methods. We will then define a figure of merit 
statistic for each method, and use Monte Carlo simulations to argue that astrophysical 
signals will not be excluded by our RFI and noise mitigation methods. Finally, we will 
mention our successful detection of giant pulses from the Crab pulsar, providing further 
evidence that astrophysical signals will not be excluded. 

The dominant RFI sources at Arecibo Observatory are nearby radars, which emit 
one of (at least) 6 repeating patterns. These include the Federal Aviation Administration 
(FAA) radar, used for air traffic control around Puerto Rico; the aerostat radar, used for 
drug interdiction; and others. 



5.1. RFI and noise mitigation methods 



We rejected RFI and noise using several methods. We will first discuss methods 
implemented prior to pulse detection. Most of these methods (I5.1.2[ I5.1.3[ and 15. 1.4j) 
blanked segments of the data which were likely to be contaminated by radio frequency 
interference. We will then describe a number of post-detection techniques. Using these 
techniques, we characterized pulses as probable RFI by examining their dispersion measures, 
polarizations, shapes, and other properties, and by examining other pulses detected at 
nearby times. 



5.1.1. Arecibo 's high pass filter 



Arecibo can turn on a high pass filter in the receiver that will reject the FAA radar's 
band from the data. However, Astropulse operates commensally, and our partner surveys 
usually require that this filter is turned off. 



-29 - 



5.1.2. Hardware blanker 



Arecibo Observatory provides us with a blanking signal (which we will call the 
"hardware" blanking signal), a single bit which is turned on when the FAA radar is 
transmitting a pulse. The "hardware" blanker has two components: 



1. The hardware component at Arecibo, which adds the blanking bit to our tape files. 

2. A software component, activated at a later point in our data processing pipeline, 
which detects the bit and blanks the appropriate data. (This software component still 
counts as a part of our "hardware" blanker.) 



It is critical that we blank the data using noise that has the same frequency profile 
as the clean data. If we instead blank the data using white noise (bits set randomly to 1 
and 0) artificial signals will be introduced, since the white noise does not match the rest of 
the data. Therefore, we need to blank the data by replacing it with noise whose frequency 
envelope matches that of our data recorder's 2.5 MHz bandpass filter. 

Unfortunately, the hardware blanker is imperfect. First, we believe it doesn't mark 
every FAA radar pulse. Sometimes the radar's phase changes, and it takes some time for 
the hardware blanker to catch up. At other times, a single radar pulse may arrive that is 
out of sync with the other pulses. Second, the hardware blanker only searches for the FAA 
radar, not for other radar. So we have written our own software blanker, which processes 
the data downstream from the hardware blanker. The software blanker handles both the 
FAA radar and the aerostat radar. 



5.1.3. Software blanker 



The software blanker is an algorithm that runs on a computer in Space Sciences Lab. 
It examines the data for the repeating patterns that signify radar. It looks specifically for 
the FAA and aerostat radar. 

To find either radar, the blanker looks at samples in groups of 10. A radar pulse would 
consist of samples where the bits are predominantly 1 or predominantly 0. (That's 280 bits, 
counting all 28 bits for each sample, 2 polarizations, 7 beams, and both real and imaginary 
bits.) At maximum strength, the radar will produce long strings of bits that are all set to 1, 
regardless of whether they represent real or imaginary data. At lesser strengths, the radar 
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produces less skewed sets of samples, with a "ring down" oscillation between 1 and bits. 
Nevertheless, these sets of samples at lesser strengths can provide an important indication 
of radar. 

The blanker folds the data over 25 seconds at the known radar periods, which are 
35,262 samples for the FAA radar, and 57,571 samples for the aerostat radar. (For each 
radar source, the intervals between pulses vary according to a set sequence. So the stated 
FAA period actually contains 5 pulses, and the aerostat period contains 7 pulses.) Actually, 
we fold at around 200 trial periods, each varying slightly from the average radar period. 
This is necessary because the radar's period can drift slightly. We threshold the resulting 
amplitudes at 25% above the mean. 

If a radar signal has been detected, we blank data at regular intervals over the 25 
seconds, accounting for the varying interpulse periods. Typically, the radar shape is square: 
a total radar silence, followed by radar of duration 800 samples, followed by more silence. 
However, we have observed that individual radar pulses may be smeared by up to 50 
samples on either side of the region of duration 800. Therefore, we blank 100 samples on 
either side, for safety. In total, we typically blank 1,000 samples for each radar pulse. 

Most of our data contains FAA radar signals, and relatively little has aerostat signals. 
Overall, we blank about 14% of our data during this stage. The software blanker is very 
effective in comparison with the hardware blanker; more than 99% of pulses that are 
detected by the hardware blanker are also detected by the software blanker, whereas the 
reverse is not true. 



5.1.4- Client blanker 



The software blanker is effective at removing FAA and aerostat radar patterns from 
our data, but other types of RFI exist which do not have these periods. In order to detect 
and remove these other signals, we implement a blanker in the Astropulse client. Like the 
software blanker, the client blanker searches for RFI that is strong enough to saturate our 
electronics, producing a long string of identical samples. We have found that this string 
of identical samples not only indicates RFI at that particular instant, but also warns that 
other nearby data in the same workunit may be contaminated by RFI. Although a string 
of identical samples is not itself a dispersed pulse, it can signify that dispersed RFI may 
be present in the nearby data. Astropulse could record this dispersed RFI as candidate 
pulse in our database. Therefore, we blank all data within 400,000 samples (0.16 s) of the 
detected event. This figure, 400,000 samples, was determined empirically, by examining 
many workunits showing strings of identical samples. We found that these strings were 
often accompanied by dispersed pulses within a distance of 400,000 samples. We do not 
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believe that these dispersed pulses represented true astrophysical transients, because they 
were very common, and because, at least for our system, the strings of identical samples are 
a known feature of RFI. 

The client blanker differs from the software blanker in that we consider individual RFI 
events, rather than folding several events together. This enables detection of RFI with 
unknown periods. The client blanker proceeds by performing a Fourier transform of a 
segment of the data, and examining the power in the central bin (the DC component.) 



5.1.5. Fraction blanked restriction 



This and all following methods occurred in the post-detection phase; that is, we stored 
pulses in our database prior to determining whether they passed or failed this test. For 
the current method, we consider each workunit and record the fraction of the data that 
we blanked using the client blanker. We remove workunits entirely if too much RFI was 
present, since we have observed that the presence of too much RFI in one region of the 
workunit may indicate some amount of RFI in other regions. 



5.1.6. DM repetition 



If we see a signal at the same DM repeatedly over a short period of time and at 
different parts of the sky, we conclude that it came from a terrestrial source and reject 
these signals as RFI. There is no reason that the same DM should have been observed 
from several different directions in quick succession unless the source was terrestrial. This 
test, like the multi-beam and multi-polarization tests described below, is implemented by 
a program that examines our database of detected candidate pulses, putting them in time 
order and searching for the appropriate pattern. 



5.1.7. Multiple simultaneous beams 



This method, which makes use of the ALFA receiver's multiple beams, proved to be 
less useful than we had hoped. We did not make use of it for the calculations discussed in 
this paper, but nevertheless we describe the method here. The method relies on the fact 
that our beams are separated by several arcminutes, so an astrophysical source (or any 
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relatively weak source) should not appear in multiple beams simultaneously. However, a 
very strong source, for instance a terrestrial source, might appear in the beams' sidelobes. 
The source's radio waves might arrive at the telescope by scattering from nearby terrain, 
or by bouncing off the telescope support structure. In this case, the waves might appear 
in multiple beams. Since the telescope never points at or below the horizon, such sources 
would appear only in the sidelobes and never in the main lobe. 

Therefore, we could rule out some RFI by ignoring pulses that appear in multiple 
beams simultaneously. Unfortunately, experiments show that a few real, astrophysical 
signals appear in multiple beams simultaneously - namely, the detected Crab pulses. 
This may happen because the main lobes intersect slightly, albeit at greatly diminished 
sensitivity, or because strong astrophysical signals could be detected in sidelobes. So such a 
method is imperfect at best. 



5.1.8. Two simultaneous polarizations 



A signal from an unpolarized astrophysical source will appear in both polarizations 
simultaneously (unless it is only marginally detectable.) RFI might also behave this way, 
but noise will not. Therefore, we can reject a great deal of noise by requiring detections in 
two simultaneous polarizations. Unfortunately, highly polarized astrophysical signals may 
also be rejected, especially if the signal's axis of polarization lines up with the telescope's 
axis of polarization. This drawback is balanced by the extraordinary efficacy of the 
polarization test as a noise rejection technique. 



5.1.9. Frequency profile 



We are looking for broadband pulses with a short intrinsic timescale. Thus, the pulse 
should have roughly the same mean power at all frequencies. We perform a chi-square test 
to determine whether the mean power is the same everywhere. However, the chi-square 
distribution is not a perfect description of the power vs. frequency distribution unless the 
power has a Gaussian distribution at each frequency. In fact, the power should have an 
exponential distribution, not Gaussian. 

The frequency profile test calculates a "log_prob" statistic, which is the natural log of 
the estimated probability that this frequency profile would occur by chance. However, if 
the chi square is inaccurate, so is the log_prob. Nevertheless, the log_prob should decrease 
(and is negative) as the power becomes concentrated at particular frequencies. Although 
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the log_prob has uncertain meaning in the absolute sense, its relative value is meaningful. 



5.2. Figure of merit 



We can assign a figure of merit to each RFI rejection algorithm, or to all algorithms 
together. The figure of merit is defined as: 

(% of astrophysical pulses passing) / (% of all candidate pulses passing). 

A candidate pulse is a detection that is above threshold, which may or may not 
correspond to a signal from an astrophysical source. The purpose of an RFI rejection 
algorithm is to throw out spurious candidate pulses, while hopefully preserving most of the 
astrophysical pulses. If the figure of merit is equal to 1, the algorithm does not change the 
percent of candidate pulses that are astrophysical. In other words, we could have achieved 
the same result by throwing out a random collection of our pulses. Therefore, an algorithm 
cannot be useful unless its figure or merit is greater than 1. 

This definition of the figure of merit is not the only one imaginable. For example, 
suppose we have 1,000 pulses, of which 100 are astrophysical, and two algorithms. The first 
algorithm cuts the list down to 10 pulses, of which 9 are astrophysical. It has a figure of 
merit equal to 9. The second algorithm instead cuts the list down to 100 pulses, of which 
80 are astrophysical. It has a figure of merit equal to 8. Then one might prefer the latter 
algorithm, on the grounds that it yields more data to work with (even though a smaller 
percent of that data is good). 

Nevertheless, this figure of merit is reasonable, and we calculate its value for each of 
our RFI rejection algorithms in the following sections. In our calculations, we will assume 
that the vast majority of candidate pulses are due to noise, since if this large number 
of transient radio signals were of astrophysical origin, the phenomenon would have been 
reported by other surveys. 

In the following sections, we will compute a figure of merit for the post-processing 
methods. We cannot compute a figure of merit for pre-processing methods, since we cannot 
count the number of candidate pulses (such as RFI) that we would have detected in absence 
of those methods. 
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5.2.1. Fraction blanked restriction: figure of merit 



Empirically, it turns out that we can obtain the best figure of merit by passing only 
those workunits for which the fraction blanked (by the client blanker) is < 20%. Note 
that the client blanker has already removed a portion of each workunit. Here, we do not 
consider the figure of merit resulting from the operation of the client blanker itself. Rather, 
we are throwing out workunits for which a large portion has already been removed. As of 
December 2009, the figure of merit statistics are as given in Table [3j Instead of simulating 
astrophysical pulses, we have counted the space available for such pulses in all workunits. 
Only unblanked space may contain astrophysical pulses (or any pulses that originate outside 
the telescope), and we are assuming that the likelihood for an astrophysical pulse to appear 
in a workunit is proportional to the amount of unblanked space in that workunit. 



5.2.2. DM repetition: figure of merit 



To simulate the fraction of astrophysical pulses that would be accepted by the DM 
repetition algorithm, we performed a Monte Carlo study, generating a list of 37,572 pulses 
at random times. We made use of this list of pulses to analyze the DM repetition, multiple 
beams, and simultaneous polarizations tests. The random times were determined by 
considering the start times of actual workunits, then selecting a random time within that 
workunit. Other relevant parameters, such as dispersion measure, beam, and polarization, 
were assigned a uniformly distributed random value within the allowed range. (Pulse area 
was not relevant for this list of test pulses.) 

Using the random dispersion measures, we counted the number of detected pulses with 
the same dispersion measure preceding and following the test pulses. The test pulses were 
accepted or rejected using the same criteria as the DM repetition RFI rejection method. 

Monte Carlo statistics for simulated astrophysical pulses, after 3 passes through 12, 524 
workunits, generating 37, 572 test pulses, are listed in Table [3j 



5.2.3. Multiple simultaneous beams: figure of merit 



To simulate the fraction of astrophysical pulses that would be accepted by the 
"simultaneous beams" algorithm, we used the same Monte Carlo study that we performed 
for DM repetition. The test pulses were accepted or rejected using the criteria from the 
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simultaneous beams" RFI rejection method. 

Monte Carlo statistics, for the same pulses as in Section 15.2.21 are listed in Table |3J 



5.2.4- Two simultaneous polarizations: figure of merit 



If all detected astrophysical pulses were completely unpolarized, or were above threshold 
in both polarizations, then all of them would pass the "simultaneous polarizations" test. 
However, even if the astrophysical component of the pulse is unpolarized, the noise 
component is independent of the astrophysical component. Thus, pulses near threshold 
may be detectable in only one polarization. 

To simulate the fraction of unpolarized astrophysical pulses that would be accepted by 
this test, we generated pulse area values with a cumulative distribution c(s) oc s~ 3 ^ 2 , or a 
probability density function h(s) oc s~ 5 ^ 2 , where s, drawn from the random variable S, is 
the pulse area. 

The reason for the s~ 3 ^ 2 cumulative distribution is that if we assume a standard candle 
source (same luminosity vs. time for all sources), then the sources at distance r have flux at 
Earth proportional to The number of sources within distance r (hence with flux greater 

than S oc -7), is proportional to r 3 oc S~ 3 / 2 . 

After determining the test pulse's area, we generate two mini workunit files that 
contain the pulse. Each file combines the pulse with noise randomly, so that different noise 
is generated in the two mini workunits. Then, we dedisperse the two files and find the 
noise-modified pulse areas. The pulse passes the "simultaneous polarization" test if it is 
above the detection threshold in both polarizations. If it is only above threshold in one 
polarization, it fails the test. And if it is below threshold in both polarizations, it would not 
be detected at all, so it does not pass or fail. 

Note that for a given pulse area threshold, there is a unique probability density 
function (pdf) with h(s) oc s~ 5//2 , so there is no ambiguity about normalization. If more 
astrophysical sources are present, the total number of sources detected will increase, but 
the pdf will not change. 

After 1,000 pairs of test pulses, the figure of merit statistics are given in Table |3j The 
x statistic is not the number of pulses generated (2,000), but the number detected; some 
pulses were below threshold. In the table, the x statistic counts pairs of corresponding 
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pulses as two, whereas the y statistic counts each pair as a single pulse, and z is their ratio. 
The z 2 statistic was arrived at in a similar manner. 



5.2.5. Frequency profile: figure of merit 



Using the same mini workunits generated for the polarization test, we determine 
whether the pulse would pass the frequency profile test. The pulse passes the frequency 
profile test if its spectrum is flat. Again, the pdf of the pulse area is unique, given the pulse 
area threshold, therefore there is no ambiguity as to the pulse powers we should use. 

After a Monte Carlo using threshold log_prob > -1, and after 1,000 pairs of test pulses, 
the figure of merit statistics are given in Table |3j The x statistic is not the number of pulses 
generated (2,000), but the number detected; some pulses were below threshold. 



5.2.6. Overall: figure of merit 



There seems to be no reason that the astrophysical pulses' passing fractions, as 
described above, should be correlated. (Especially if we exclude the multi-beams test, 
which is probably unreliable.) An astrophysical pulse that passes the DM repetition test 
is no likelier than any other to pass the multi-pols test, the fraction blanked test, or the 
frequency profile test. 

To see this, one has to consider the tests in pairs, and think about the nature of the 
tests. In each case, the property measured by one test is entirely unrelated to the property 
measured by the other. A pulse passes the multi-pols test if it is strong and/or unpolarized, 
and it fails the DM repetition test if nearby (noise or RFI) pulses have the same DM as the 
signal. It passes the fraction blanked test if its workunit has a lot of RFI that overwhelms 
the receiver or IF electronics, and it passes the frequency profile test if it spectrum is flat. 
So we will assume that the four tests are statistically independent, an assumption which we 
discuss in the following section. 

So we expect the fraction of astrophysical pulses passing all tests to be: 0.557 ■ 0.958 ■ 
0.481 • 0.797 = 0.205, where we have just multiplied the fraction passing from each test 
above. 

On the other hand, the fraction of candidate pulses passing all tests is: 47/412001 = 
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0.000114, as of February 2010. (Not counting pulses from our observation of the Crab 
pulsar.) This makes for a figure of merit equal to 1797, substantially larger than the product 
of the individual figures of merit, which is 33 (see Table El) This makes sense, because the 
multi-pols test is designed to catch noise, whereas the other tests are designed to catch 
RFI. So we might expect each test to be less effective on its own, but more effective in 
combination with other tests. (For instance, imagine a fictitious data set in which 49% of 
all signals are noise, 49% are RFI, and 2% are real. If algorithm A removes all noise, and 
algorithm B removes all RFI, then the two together have a figure of merit of 1/0.02 = 50, 
whereas separately they have 1/0.51 ~ 2.) 



5.2.7. Statistical independence of our tests 

In Section 15. H we have considered several RFI and noise rejection methods. Our four 
post-detection methods were as follows: 



A. Fraction blanked restriction 

B. DM repetition 

C. Two simultaneous polarizations 

D. Frequency profile 



We also described how we have tested the figure of merit for each of these four methods 
individually, using Monte Carlo simulations. In each case, we attempted to make reasonable 
assumptions about the distribution of astrophysical and spurious signals that would be 
received; these assumptions are detailed in the relevant sections above. In the case of 
methods (C) and (D), we were required to model the power of the test pulse relative to 
the noise. For methods (A) and (B) it was not necessary to model the power, since these 
methods depended only on the arrival time and dispersion measure of the pulse. For each 
of the four RFI and noise rejection methods, we have computed, using a simulation, the 
fraction of astrophysical pulses that would pass the test; see Table |3j 

However, ultimately we are not interested in the individual algorithms, but in the 
overall effect of our entire pipeline. One might be concerned that astrophysical pulses will 
pass individual algorithms, but will fail when they are presented with multiple algorithms. 
Given the four individual passing fractions just mentioned, can we derive the overall passing 
fraction for astrophysical pulses? 



-38 - 



To answer this question, we must determine whether our RFI rejection methods are 
statistically independent. That is, if method (A) allows astrophysical signals to pass with 
probability pa, and method (B) allows signals to pass with probability Pb, then the two 
methods are statistically independent if the probabilities for the joint outcomes are as given 
in Table gj 

Another way of saying this is that two methods are independent if their outcomes are 
uncorrelated. In the event that all four methods are jointly independent, then the joint 
passing fraction can be found by multiplying the individual passing fractions. 

However, if all methods are not independent, the joint passing fraction may not be 
the same as the product of the individual passing fractions. In particular, we would like to 
make sure that the joint passing fraction for astrophysical pulses is not too low. Therefore, 
we should attempt to verify that the four methods are statistically independent. We might 
attempt to accomplish this by simulating the four methods using computer generated fake 
"astrophysical" signals. Unfortunately, it turns out that the statistical independence of 
the four methods is highly contingent on our choice of a model for the behavior of the 
astrophysical pulses and of the RFI. We do not observe the astrophysical pulses directly, 
nor do we measure all properties of the RFI. Therefore, we must make some assumptions 
about their properties, and different assumptions will lead to different models of the four 
RFI rejection methods. 

There is no way out of this: with incomplete information about the astrophysical pulses 
and RFI, we must make some assumptions. We would not be able to run a simulation to 
test these assumptions, since any such simulation would depend upon the very assumptions 
it was supposed to test. Therefore, the best we can do is to carefully state the assumptions 
that lead us to model the four methods as being statistically independent. To do this, we 
consider the four methods in pairs, and explain what it would mean to assume that each 
pair is statistically independent. 

When considering the pairs, we are only concerned about factors that would produce 
negative correlations, where pulses which pass one test are likely to fail another. These 
negative correlations would reduce the number of astrophysical pulses that would make it 
through our algorithms. (Positive correlations, where pulses which pass one test are likely 
to pass another, would actually improve our ability to detect astrophysical pulses.) 

As we consider pairs of methods, our language will be somewhat repetitive, since each 
method will be discussed three times. The reader may wish to examine just one or two 
entries, to get a flavor of the kind of reasoning that is required. 

(A) and (B): fraction blanked and DM repetition: 
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Astrophysical pulses will fail method (A) if the client blanker detects a large amount of 
RFI at a nearby time. They will fail method (B) if nearby pulses have the same dispersion 
measure as the astrophysical pulse. This would be most likely to happen if the RFI over a 
particular period of time was concentrated at a particular dispersion measure. Thus, the 
two methods will be negatively correlated if repeating dispersion measures in the RFI are 
correlated with low amounts of RFI, and nonrepeating dispersion measures are correlated 
with high amounts of RFI. We are unaware of any such effect. 

(A) and (C): fraction blanked and simultaneous polarizations 

Astrophysical pulses will fail method (A) if the client blanker detects a large amount 
of RFI at a nearby time. They will fail method (C) if they are too weak to show up in both 
polarizations, or if they are highly polarized. The two methods will be negatively correlated 
if weak or polarized astrophysical pulses are more likely to arrive during periods of few RFI 
detections. Since astrophysical pulses come from a different source than RFI, this seems 
unlikely. 

(A) and (D): fraction blanked and frequency profile 

Astrophysical pulses will fail method (A) if the client blanker detects a large amount 
of RFI at a nearby time. They will fail method (D) if they are not broadband, or (more 
likely) if their profile is distorted by noise or RFI to appear as if it is not broadband. The 
two methods will be negatively correlated if narrowband RFI, of a sort that could distort 
an astrophysical signal's frequency profile, is correlated with low amounts of RFI. We are 
unaware of any such effect. 

(B) and (C): DM repetition and simultaneous polarizations 

Astrophysical pulses will fail method (B) if nearby pulses have the same dispersion 
measure as the astrophysical pulse. They will fail method (C) if they are too weak to show 
up in both polarizations, or if they are highly polarized. The two methods will be negatively 
correlated if weak or polarized astrophysical pulses are less likely to arrive during periods 
of RFI with repeating dispersion measure. Since astrophysical pulses come from a different 
source than RFI, this seems unlikely. 

(B) and (D): DM repetition and frequency profile 

Astrophysical pulses will fail method (B) if nearby pulses have the same dispersion 
measure as the astrophysical pulse. They will fail method (D) if they are not broadband, 
or (more likely) if their profile is distorted by noise or RFI to appear as if it is not 
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broadband. The two methods will be negatively correlated if RFI with repeating dispersion 
measure is correlated with low amounts of narrowband RFI, of a sort that could distort an 
astrophysical signal's frequency profile. We are unaware of any such effect. 

(C) and (D): simultaneous polarizations and frequency profile 

Astrophysical pulses will fail method (C) if they are too weak to show up in both 
polarizations, or if they are highly polarized. They will fail method (D) if they are not 
broadband, or (more likely) if their profile is distorted by noise or RFI to appear as if 
it is not broadband. The two methods will be negatively correlated if weak or polarized 
astrophysical pulses are less likely to arrive at the same time as narrowband RFI. Since 
astrophysical pulses come from a different source than RFI, this seems unlikely. 

Note that these six pairs do not exhaust all of the theoretical possibilities; it is possible 
for three quantities to be statistically dependent even if two are independent; however, 
similar sorts of reasoning apply to the case of three quantities' simultaneous correlations. 



5.2.8. Detection of giant pulses from the Crab pulsar 

The reasoning in Section 15.2.71 aimed to demonstrate that astrophysical pulses have a 
good chance to get through our RFI mitigation algorithms. Additional supporting evidence 
comes from our detection of several giant pulses from the Crab. 

We observed the Crab pulsar intermittently from 11:39 AST to 13:27 AST on June 7, 
2009. We detected 1,404 candidate pulses over that time period, of which 89 passed all of 
our RFI and noise mitigation tests. All 89 pulses had dispersion measures ranging from 
50.0 pc cm -3 to 62.6 pc cm -3 , with a standard deviation of 1.8 pc cm -3 . This tells us that 
Astropulse can typically estimate a dispersion measure to within 1.8 pc cm" 3 , or within 
6 pc cm -3 in the worst case. Most of these pulses seem to be duplicate representations of 
a single astrophysical pulse (or group of pulses). Astropulse may detect the same pulse in 
different beams, polarizations, or scales. The 89 pulses passing our tests corresponded to 3 
groups clustered in time, indicating that we detected at least 3 giant pulses. 

Of the 1,404 candidate pulses, 171 had dispersion measures ranging from 50.0 pc cm -3 
to 62.6 pc cm -3 . These candidate pulses clustered into 10 groups. We can use this fact to 
estimate the passing fraction of astrophysical pulses from the Crab. At most 10 pulses were 
astrophysical, therefore the passing fraction was at least 3/10 = 30%, which corresponds 
roughly to our simulated value of 0.205. 
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We detected these same pulses independently, using a spectrometer at Arecibo 
Observatory, and verified that the two methods found the pulses at precisely the same 
times. Therefore, we can be confident that we have detected giant pulses from the Crab 
pulsar, indicating that Astropulse can detect at least some astrophysical signals. 



6. Conclusion 



We designed software to search for microsecond transient radio pulses, using coherent 
dedispersion to examine the microsecond timescale. We have focused on evaporating 
primordial black holes as a potential source of microsecond pulses, but other sources are 
possible, including giant pulses from pulsars, RRATs, and as yet unknown astrophysical 
phenomena. In order to obtain the computational power required for coherent dedispersion, 
we distributed the software to around 500,000 volunteers, and processed 1,540 hours of 
observation time at the Arecibo telescope using the Arecibo L-band Feed Array (ALFA) 
receiver. We observed simultaneously with each of 7 beams and 2 polarizations per beam, 
for a total of 21,600 hours of data. 



In this paper, we have presented the design of the Astropulse experiment, including the 
scientific rationale and goals of the project, details of the algorithm, data processing, and 
distributed computing methodology, and our techniques for RFI and noise mitigation. We 
compared Astropulse's sensitivity to other searches by computing the minimum detectable 
pulse area, in Jy /is , for a microsecond pulse that could be detected by each survey. In 
this respec t, Astropulse i s more sensitive than all other radio surveys we considered, except 



for that of De neva et all (|2009h . Astropulse is able to detect pulses of area 54 Jy /xs, 



whereas Deneva et al. (2009) can detect pulses of area 8.5 Jy /is . Astropulse also had 
the second-largest a mount observation t i me, a total of 10,800 hours of data counting one 
polarization, whereas iMcLaughlin et all ( 2006h observed for 20,800 hours. 



We employed multiple techniques to mitigate radio frequency interference (RFI). 
Several layers of hardware and software were devoted to blanking known terrestrial radio 
sources in the vicinity of the Arecibo telescope. We ruled out signals which appeared 
at the same DM repeatedly at multiple points on the sky, since an astrophysical source 
would be located at a single point. We also ruled out signals which did not occur in both 
polarizations simultaneously, searching only for unpolarized signals. Although we lost the 
opportunity to detect polarized signals, we found that this polarization criterion allowed 
us to reject a substantial amount of noise. Finally, we required signals to be broadband, 
having a similar power at all frequencies, a condition that would hold true for any signal 
with a short intrinsic timescale. By combining these criteria, we were able to reject all but 
approximately 1 in 10,000 detected pulses. By performing simulations, we estimated that 
about 1 in 5 astrophysical pulses would pass our RFI rejection criteria. Therefore, as long 
as we have detected 5 or more astrophysical pulses, we would expect some to remain after 



the RFI mitigation step. 



In a future paper, we will describe the results of the Astropulse experiment, outlining 
the characteristics of the pulses that remained, and our conclusions about them. 
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Table 3: Figures of merit for RFI mitigation algorithms. In the table, x is the number of 
pulses analyzed in a Monte Carlo test or other simulation, y is the number of those pulses 
that pass this test, and z is the fraction passing for simulated pulses. x 2 is the number of 
candidate pulses analyzed so far, y 2 is the number of candidate pulses passing, and z 2 is the 
fraction passing for candidate pulses. The figure of merit (FoM) is defined as z/z 2 . Note 
that in the row for "fraction blanked," x and y refer to un-blanked space in all workunits 
(before and after the test is applied), in units of full workunit lengths. 



algorithm 


X 


y 


z 


X 2 


V2 


z 2 


FoM 


fraction blanked 


2,457,187 


1,368,632 


0.557 


256,085 


122,627 


0.479 


1.16 


DM repetition 


37,572 


35,994 


0.958 


204,994 


114,795 


0.560 


1.71 


multi-beams 


37,572 


37,420 


0.996 


256,085 


220,573 


0.861 


1.16 


multi-pols 


1,086 


522 


0.481 






0.0386 


12.5 


frequency profile 


1,075 


857 


0.797 


246,870 


149,277 


0.605 


1.32 



Table 4: Criterion for the statistical independence of two RFI mitigation methods. Two 
methods are independent if their outcomes on astrophysical pulses are statistically uncorre- 
lated, so that the probability of each outcome pair is given by the expression in the table. 



Outcome Probability 
pass A and pass B Pa ■ Pb 

pass A and fail B pa • (1 — Pb) 
fail A and pass B (1 — p^) ■ Pb 

fail A and fail B (1 - p A ) • (1 - p B ) 
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Fig. 1. — Gamma distributions. The x axis is integrated power (divided by 2, as per our 
convention), and the y axis is probability per unit power. The leftmost distribution, which is 
exponential, belongs to n — 1. The rest are n = 2,4, 8, 16, 32, 64. Notice that the rightmost 
distribution is nearly a normal distribution. 
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Fig. 2. — The Astropulse screen saver. 



